医疗关系提取(MRE)任务旨在提取医学文本中实体之间的关系。传统的关系提取方法通过探索句法信息,例如依赖树。但是,由外域解析器产生的医学文本的1好的依赖树的质量相对有限,因此医疗关系提取方法的性能可能会退化。为此,我们提出了一种基于因果解释理论的医学文本中共同模拟语义和句法信息的方法。我们生成依赖性森林,这些森林由1-最佳依赖树组成。然后,采用特定于任务的因果解释者来修剪依赖性森林,该森林将进一步送入设计的图形卷积网络,以学习下游任务的相应表示。从经验上讲,基准医学数据集的各种比较证明了我们模型的有效性。
translated by 谷歌翻译
很少有学习模型学习人类注释有限,而这种学习范式在各种任务中证明了实用性数据使该模型无法充分探索语义信息。为了解决这个问题,我们将知识蒸馏引入了几个弹出的对象检测学习范式。我们进一步进行了激励实验,该实验表明,在知识蒸馏的过程中,教师模型的经验误差将少数拍物对象检测模型的预测性能(作为学生)退化。为了了解这种现象背后的原因,我们从因果理论的角度重新审视了几个对象检测任务上知识蒸馏的学习范式,并因此发展了一个结构性因果模型。遵循理论指导,我们建议使用基于后门调整的知识蒸馏方法,用于少数拍物检测任务,即Disentangle和Remerge(D&R),以对相应的结构性因果模型进行有条件的因果干预。从理论上讲,我们为后门标准提供了扩展的定义,即一般后门路径,可以在特定情况下扩展后门标准的理论应用边界。从经验上讲,多个基准数据集上的实验表明,D&R可以在几个射击对象检测中产生显着的性能提升。
translated by 谷歌翻译
手语制作(SLP)旨在将口语语言自动转化为符号序列。 SLP的核心过程是将符号光泽序列转换为其相应的标志姿势序列(G2P)。大多数现有的G2P模型通常以自回归方式执行这种条件的远程生成,这不可避免地导致错误的积累。为了解决这个问题,我们提出了一种量化量子序列序列的生成的矢量量化扩散方法,称为poseVQ扩散,这是一种迭代性非自动入学方法。具体而言,我们首先引入量化量化变量自动编码器(姿势VQVAE)模型,以表示姿势序列作为一系列潜在代码。然后,我们通过最近开发的扩散体系结构的扩展来对潜在离散空间进行建模。为了更好地利用时空信息,我们介绍了一种新颖的体系结构,即CodeUnet,以在离散空间中生成更高质量的姿势序列。此外,利用学习的代码,我们开发了一种新型的顺序k-nearest-neighbours方法,以预测相应的光泽序列的姿势序列的可变长度。因此,与自回旋G2P模型相比,我们的模型具有更快的采样速度,并产生明显更好的结果。与以前的非自动入学G2P方法相比,PoseVQ扩散通过迭代改进改善了预测的结果,从而在SLP评估基准上获得了最新的结果。
translated by 谷歌翻译
自动开放域对话评估是对话系统的关键组成部分。最近,基于学习的评估指标在开放域对话评估中取得了最先进的表现。但是,这些仅关注一些素质的指标很难全面评估对话。此外,这些指标缺乏有效的分数组成方法,无法获得各种评估质量。为了解决上述问题,我们提出了基于相关性重新缩放(MME-CR)的多项式评估,以评估开放域对话。首先,我们建立了一个评估度量,该评估度量由5组平行的子对象组成,称为多金属评估(MME),以全面评估对话的质量。此外,我们提出了一种称为相关重新缩放(CRS)的新型分数组成方法,以模拟子计量与多样性之间的关系。我们的方法MME-CRS在DSTC10 TRACK5 SubTask1自动开放域对话评估挑战的最终测试数据中排名第一,这证明了我们提出的方法的有效性。
translated by 谷歌翻译
本文挑战跨域语义分割任务,旨在提高未标记的目标域上的分割精度,而不会产生额外的注释。使用基于伪标签的无监督域适应(UDA)管道,我们提出了一种新颖且有效的多融合适应(MFA)方法。 MFA基本上考虑了三个并行信息融合策略,即跨模型融合,时间融合和新型在线脱机伪标签融合。具体而言,在线脱机伪标签融合鼓励自适应培训来额外关注离线伪标签容易被忽视的困难区域,从而保留更多的信息性细节。虽然其他两个融合策略可能看起来标准,但MFA努力提高整合效率和有效性,并成功将所有三种策略注入统一框架。两种广泛使用的基准测试,即GTA5对城市景观和合成城市景观的实验表明,我们的方法显着提高了语义分割适应,并分别建立了新技术(分别为58.2%和62.5%Miou) 。代码将在https://github.com/kaizhang/mfa上获得。
translated by 谷歌翻译
The dominant multi-camera 3D detection paradigm is based on explicit 3D feature construction, which requires complicated indexing of local image-view features via 3D-to-2D projection. Other methods implicitly introduce geometric positional encoding and perform global attention (e.g., PETR) to build the relationship between image tokens and 3D objects. The 3D-to-2D perspective inconsistency and global attention lead to a weak correlation between foreground tokens and queries, resulting in slow convergence. We propose Focal-PETR with instance-guided supervision and spatial alignment module to adaptively focus object queries on discriminative foreground regions. Focal-PETR additionally introduces a down-sampling strategy to reduce the consumption of global attention. Due to the highly parallelized implementation and down-sampling strategy, our model, without depth supervision, achieves leading performance on the large-scale nuScenes benchmark and a superior speed of 30 FPS on a single RTX3090 GPU. Extensive experiments show that our method outperforms PETR while consuming 3x fewer training hours. The code will be made publicly available.
translated by 谷歌翻译
Video super-resolution (VSR) aiming to reconstruct a high-resolution (HR) video from its low-resolution (LR) counterpart has made tremendous progress in recent years. However, it remains challenging to deploy existing VSR methods to real-world data with complex degradations. On the one hand, there are few well-aligned real-world VSR datasets, especially with large super-resolution scale factors, which limits the development of real-world VSR tasks. On the other hand, alignment algorithms in existing VSR methods perform poorly for real-world videos, leading to unsatisfactory results. As an attempt to address the aforementioned issues, we build a real-world 4 VSR dataset, namely MVSR4$\times$, where low- and high-resolution videos are captured with different focal length lenses of a smartphone, respectively. Moreover, we propose an effective alignment method for real-world VSR, namely EAVSR. EAVSR takes the proposed multi-layer adaptive spatial transform network (MultiAdaSTN) to refine the offsets provided by the pre-trained optical flow estimation network. Experimental results on RealVSR and MVSR4$\times$ datasets show the effectiveness and practicality of our method, and we achieve state-of-the-art performance in real-world VSR task. The dataset and code will be publicly available.
translated by 谷歌翻译
Variational Graph Autoencoders (VGAEs) are powerful models for unsupervised learning of node representations from graph data. In this work, we systematically analyze modeling node attributes in VGAEs and show that attribute decoding is important for node representation learning. We further propose a new learning model, interpretable NOde Representation with Attribute Decoding (NORAD). The model encodes node representations in an interpretable approach: node representations capture community structures in the graph and the relationship between communities and node attributes. We further propose a rectifying procedure to refine node representations of isolated notes, improving the quality of these nodes' representations. Our empirical results demonstrate the advantage of the proposed model when learning graph data in an interpretable approach.
translated by 谷歌翻译
In this paper, we explore the feasibility of utilizing a mmWave radar sensor installed on a UAV to reconstruct the 3D shapes of multiple objects in a space. The UAV hovers at various locations in the space, and its onboard radar senor collects raw radar data via scanning the space with Synthetic Aperture Radar (SAR) operation. The radar data is sent to a deep neural network model, which outputs the point cloud reconstruction of the multiple objects in the space. We evaluate two different models. Model 1 is our recently proposed 3DRIMR/R2P model, and Model 2 is formed by adding a segmentation stage in the processing pipeline of Model 1. Our experiments have demonstrated that both models are promising in solving the multiple object reconstruction problem. We also show that Model 2, despite producing denser and smoother point clouds, can lead to higher reconstruction loss or even loss of objects. In addition, we find that both models are robust to the highly noisy radar data obtained by unstable SAR operation due to the instability or vibration of a small UAV hovering at its intended scanning point. Our exploratory study has shown a promising direction of applying mmWave radar sensing in 3D object reconstruction.
translated by 谷歌翻译
Optimal Transport(OT)提供了一个多功能框架,以几何有意义的方式比较复杂的数据分布。计算Wasserstein距离和概率措施之间的大地测量方法的传统方法需要网络依赖性域离散化,并且受差异性的诅咒。我们提出了Geonet,这是一个网状不变的深神经操作员网络,该网络从输入对的初始和终端分布对到Wasserstein Geodesic连接两个端点分布的非线性映射。在离线训练阶段,Geonet了解了以耦合PDE系统为特征的原始和双空间中OT问题动态提出的鞍点最佳条件。随后的推理阶段是瞬时的,可以在在线学习环境中进行实时预测。我们证明,Geonet在模拟示例和CIFAR-10数据集上达到了与标准OT求解器的可比测试精度,其推断阶段计算成本大大降低了。
translated by 谷歌翻译